Pricing Perspectives: Research On Long-Term Condo Rentals Across Canada¶

Click here for the dynamic visuals ¶

Introduction¶

In this project, I will conduct an in-depth analysis of long-term rental pricing in Canada, leveraging the booming real estate market. The primary goal of this project is to uncover patterns related to apartment types, costs, and locations. By deciphering the dynamics that influence housing trends nationwide, this analysis aims to offer valuable insights for both tenants and investors.

Data Collection¶

The data used in this project was scraped from Kijiji, a well-known Canada's leading local classifieds platform, facilitating the buying, selling, and trading of a wide range of items, including cars, real estate, jobs, services, and more. It operates as a subsidiary of eBay and has a significant presence in Canada, as well as in other countries like Switzerland, Austria, and Japan.

I will be performing some activities such as;

Looping: to iteratively navigate through pages, and
Extracting vital information such as:
1. Apartment type
2. Rental cost
3. Location
4. Description
5. Number of rooms.

To efficiently parse the HTML content, I will use the BeautifulSoup library.

Data Processing¶

The data processing and cleaning procedures in the involve employing diverse techniques. To extract attribute-based text, a custom function called extract_text is utilized, while specific HTML tags are employed to target relevant information.

Data Export¶

The data is transformed into a Pandas DataFrame after undergoing a cleaning process. Subsequently, the final DataFrame is exported to an Google sheet file in order to facilitate further analysis. This step guarantees a consistent and easily accessible format for subsequent stages.

Data Visualization¶

Utilizing Power BI, the exported data is transformed into visually appealing dashboards and impactful visual representations of key metrics. By harnessing the capabilities of Power BI, stakeholders gain a holistic view of rental patterns, facilitating informed decision-making in the ever-evolving real estate market.

In [112]:

# Import necessary libraries
from bs4 import BeautifulSoup  # BeautifulSoup is used for web scraping
import requests  # Used for making HTTP requests
import pandas as pd  # Pandas library for data manipulation and analysis
import numpy as np
from df2gspread import df2gspread as d2g
import os
import gspread
from oauth2client.service_account import ServiceAccountCredentials
import warnings
warnings.filterwarnings("ignore")

In [134]:

page_numbers = np.arange(1,47)

# Initialize an empty list to store apartment rental details
apartment_rental_details = []


# Loop through the pages
for page_number in page_numbers:

    # Define the base URL for Kijiji apartment listings
    base_url = "https://www.kijiji.ca/b-apartments-condos/canada/page-"

    # URL separator for sorting by date in descending order
    url_separator = "/c37l0"

    # Construct the full URL for the current page
    url = base_url + str(page_number) + url_separator

    # Make an HTTP GET request to the URL
    response = requests.get(url)

    # Parse the HTML content of the page using BeautifulSoup
    page = BeautifulSoup(response.text, "html.parser")

    # Find the container holding the apartment listings
    listings = page.find_all("div", class_="sc-a1f24150-0 AgbZX")[10]

    # Find all individual apartment sections within the container
    apartments = listings.find_all("ul", {"class" : "sc-6cfad96a-0 jYHHlY"})[1]

    # Loop through each apartment section
    for apartment in apartments:

        # Define a function to extract text from specific HTML elements
        def extract_cost(tag, attribute):
            try:
                return apartment.find(tag, attribute).text.replace(",", "")
            except AttributeError:
                return None

        def extract_text(tag, attribute):
            try:
                return apartment.find(tag, attribute).text
            except AttributeError:
                return None
            
        # Extract apartment details: type, rental cost, location, and description
        apartment_type = extract_text("h3", {"class": "sc-c54bbc09-0 KhHgs sc-a7f125aa-8 hIkVmb"})
        rental_cost = extract_cost("div", {"class": "sc-d6110703-0 cOYpsS"})
        location = extract_text("div", {"class":"sc-3ae9d016-0 WYiPw"})
        description = extract_text("p", {"class" : "sc-c54bbc09-0 loTzYZ sc-a7f125aa-10 jA-dWnw"})

        # Append the apartment details to the list
        apartment_rental_details.append([apartment_type, rental_cost, location, description])

In [135]:

# Create a Pandas DataFrame using the apartment_rental_details list
# Specify column names as "Apartment_type", "Rental_cost", "Location", and "Description"
df = pd.DataFrame(apartment_rental_details, columns=["Apartment_type", "Rental_cost", "Location", "Description"])

df.shape

Out[135]:

(1840, 4)

In [136]:

# Filter rows in the DataFrame where "Rental_cost" is not equal to "please Contact"
df = df[df["Rental_cost"] != "Please Contact"].dropna()

# Reset the index of the DataFrame and create a new DataFrame 'data'
df1 = df.reset_index(drop=True)

In [137]:

data= df1.drop_duplicates().reset_index(drop=True)
data["Rental_cost"] = [x.strip("$") for x in data["Rental_cost"]]
data.isna().sum()

Out[137]:

Apartment_type    0
Rental_cost       0
Location          0
Description       0
dtype: int64

In [138]:

data.shape

Out[138]:

(1686, 4)

In [140]:

path = os.getcwd()
file = path + '/cool-ship-407420-9aadad67f295.json'
file

Out[140]:

'/Users/mj/Projects/Projects/webscrapingProject/cool-ship-407420-9aadad67f295.json'

In [141]:

# Import necessary libraries
scope = ['https://spreadsheets.google.com/feeds', 'https://www.googleapis.com/auth/drive', 'https://www.googleapis.com/auth/drive.file', 'https://www.googleapis.com/auth/spreadsheets']
creds = ServiceAccountCredentials.from_json_keyfile_name(file, scope)
client = gspread.authorize(creds)

# Create a new Google Sheets file with the name 'Rental Price Perspective'
spreadsheet = client.create('Rental Price Perspective')

# Specify the spreadsheet key for the newly created Google Sheets file
spreadsheet_key = '1ixobCbTWoQAJ-vXYOuu3ruvcULtWtKlhIQXoTQpUk_8'

# Uncomment the following lines if you want to work with the default first sheet
# Get the default first sheet
# sheet = spreadsheet.sheet1

# Update the Google Sheets with the DataFrame
# sheet.update([data.columns.values.tolist()] + data.values.tolist())

# Define the worksheet name and starting cell for the data upload
worksheetName = 'Sheet1'
starting_cell = 'A1'

# Upload the data to Google Sheets using the d2g library
d2g.upload(data, spreadsheet_key, worksheetName, credentials=creds, col_names=True, row_names=False, start_cell=starting_cell, clean=False)

# Print a success message
print('The sheet is uploaded successfully')

The sheet is uploaded successfully

Pricing Perspectives: Research On Long-Term Condo Rentals Across Canada¶

Click here for the dynamic visuals¶

Introduction¶

Data Collection¶

Data Processing¶

Data Export¶

Data Visualization¶

Click here for the dynamic visuals¶

Click here for scraped data¶

Click here for the dynamic visuals ¶

Click here for the dynamic visuals ¶

Click here for scraped data ¶